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ABSTRACT 

We present a fully parallelized grid-based parameter estimation algorithm for investigating multidi- 
mensional likelihoods called Snake, and apply it to cosmological parameter estimation. The basic idea 
is to map out the likelihood grid-cell by grid-cell according to decreasing likelihood, and stop when a 
certain threshold has been reached. This approach improves vastly on the "curse of dimensionality" 
problem plaguing standard grid-based parameter estimation simply by disregarding grid-cells with 
negligible likelihood. The main advantages of this method compared to standard Metropolis-Hastings 
MCMC methods include 1) trivial extraction of arbitrary conditional distributions; 2) direct access 
to Baycsian evidences; 3) better sampling of the tails of the distribution; and 4) nearly perfect par- 
allelization scaling. The main disadvantage is, as in the case of brute-force grid-based evaluation, a 
dependency on the number of parameters, 7V par . One of the main goals of the present paper is to 
determine how large A^ par can be, while still maintaining reasonable computational efficiency; we find 
that Afpar = 12 is well within the capabilities of the method. The performance of the code is tested 
by comparing cosmological parameters estimated using Snake and the WMAP-7 data with those ob- 
tained using CosmoMC, the current standard code in the field. We find fully consistent results, with 
similar computational expenses, but shorter wall time due to the perfect parallclization scheme. 
Subject headings: cosmic microwave background — cosmology: observations — methods: statistical 



1. INTRODUCTION 



Cosmological models are described in terms of a mod- 
est number of cosmological parameters that reflect the 
underlying physical processes of the universe. These 
are toda y routinely m e asure d by experiments such as 
WMAP (jJarosik et all 120111). Planck (2 11) and the 



Sloan Digital Sky Survey ([York et al.ll2000i ) through like- 
lihood techniques. 

The most popular parameter estimation algorithm in 
the c osmology community t o date is the CosmoMC pack- 
age (jLewis fc Bridle! I2002D . which maps out the cos- 
mological parameter space using a Metropolis-Hastings 
Markov Chain Monte-Carlo (MCMC) sampler. The com- 
putational cost of this method is almost exclusively de- 
termined by the external evaluation of the likelihood, 
which typically takes a few seconds per evalution; the 
expense of the internal book-keeping operation is com- 
pletely negligible compared to this. A complete analysis 
of current data sets typically requires O(10 5 ) evaluations, 
resulting in an overall computational cost of 100-10,000 
CPU hours, depending on the particular problem. 

This process can be sped up in two fundamentally dif- 
ferent ways, namely either by reducing the cost per like- 
lihood evaluation, or by reducing the number of like- 
lihood evaluations required, and both cases have al- 
ready been explored extensively in the l iterature. Ex- 
ampl es of the former include CMBFit (jSandvik et al.l 
l200l. PICO (lEendt fc Wandeltl 120071), COSMONET 
(jAuld et all 120071). sparse grids (IFrommert et al.l 120101) 
and PkANN (jAgarwal et all 120121) . all of which essen- 
tially build up a library of known cosmological models 
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given a set of parameters, and interpolate within this li- 
brary using some statistic al method. Exam ples of the 
latter include Mul tiNest (|Feroz et al.l 120091) and APS 
(|Daniel et al.l|2012T ). both of which reduce the number 
of likelihood evalutions through more efficient sampling 
algorithms than the Metropolis-Hastings sampler. 

In this paper, we present an algorithm that falls in the 
last category, aiming to reduce the total number of like- 
lihood evaluations rather than the cost per evaluation. 
The initial idea of this paper is based on the following rea- 
soning: If the problem under consideration involved only 
a one-dimensional likelihood, the mapping algorithm of 
choice would be obvious - one would simply evaluate 
the likelihood over a one-dimensional grid. The result- 
ing function is both easier to work with than a set of 
samples, as produced by a MCMC algorithm, and more 
accurate. Furthermore, it generally requires fewer eval- 
uations, because whereas an MCMC approach builds up 
the shape of the distribution by counting how many sam- 
ples fall in a given parameter range ("bin"), the direct 
approach only needs to evaluate the likelihood in a given 
bin once. In other words, the MCMC approach spends 
most of the time evaluating the same likelihood points 
over and over again, which can give the direct evaluation 
approach a computational edge. 

The vast majority of two-dimensional likelihoods are 
also mapped by grid methods rather than MCMC meth- 
ods, while for three or four dimensions, the preferred 
approach is not clear. However, for higher dimensions, 
virtually all cases are so far handled by MCMC meth- 
ods. At this stage, the so-called "curse of dimensional- 
ity" becomes highly relevant, as the number of likelihood 
evaluations depends exponentially on the number of di- 
mensions. For instance, computing 100 grid points in 
each of five dimensions requires 100 5 evaluation, which 
is generally far too many for most problems. 
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However, in this paper we point out that this is not nec- 
essarily true. The point is simply that the vast majority 
of the high-dimensionality volume typically has negligi- 
ble likelihood, and therefore does not need to be evalu- 
ated in the first place. The trick is to figure out which 
grid cells are relevant and which are not. If this can be 
done both efficiently and robustly, all the useful proper- 
ties of normal grids are retained, and computational cost 
is not compromised. Further, by virtue of not being a 
Markov chain, the algorithm parallelizes trivially, lead- 
ing to shorter overall computational wall time, which is 
often even more critical for a given analysis problem than 
the total CPU time. 

2. THE SNAKE ALGORITHM 

2.1. Algorithm 

The Snake algorithm is very simple, and easily ex- 
plained in terms of a few basic steps. To do so succinctly, 
it is useful to first define some terminology: 

The grid — The Snake algorithm operates in a virtual 
grid defined by an origo, #o and a grid cell size, A(9, 
for each parameter. All points in parameter space are 
referred to and stored as an 7V par array on the form 
#o + kA#, where k is an integer vector describing the 
multidimensional bin number. 

The surface — Each point in the grid can be assigned to 
one of three groups, namely external, internal and sur- 
face points. External points are those that have not yet 
been considered; internal points are those for which like- 
lihoods have been computed for both the point itself and 
all its neighbors; surface points are points that have been 
considered, but have at least one unexplored neighbor. 

The repository — Each considered parameter point is 
stored as an object in a data structure called the repos- 
itory. This is a two-dimensional dynamic list in which 
each row defines one grid point, and contains k, the like- 
lihood value, the locations of its neighbors in the linked 
list, and a logical flag specifying whether the point is 
currently on the surface. 

Given these definitions, the Snake algorithm may be 
summarized as follows: 

1. Initialization: Insert 9q into the repository. 

2. Evaluation: Consider the surface point with the 
highest likelihood, h, and randomly pick one of its 
unexplored neighbors, h + A. Evaluate £(#h+A)- 

3. Surface update: If h has no more unexplored neigh- 
bors, set its surface flag to false. Insert h + A into 
the repository; if all its neghbors have already been 
explored set its surface flag to false. 

4. Convergence check: If log£(0bcst-fit) — log£(#i) is 
smaller than a predefined threshold for all surface 
points, i, then exit. If not, go to (2). 

This stepping procedure leads to two distinct phases. 
First there is a burn-in period in which the solver per- 
forms a greedy maximum-likelihood search. Then, once 
the maximum has been found, the area around the peak 
is investigated such that the surface grows outwards ac- 
cording to the underlying likelihood distribution, until 



the threshold is reached. A larger threshold ensures that 
the tails are investigated more closely, but also means 
that more evaluations need to the performed which is 
computationally expensive, thus the threshold should be 
kept low, though still making sure the edges are properly 
investigated. 

The likelihood evaluation is by far the most time con- 
suming part in cosmological evaluations. This implies 
that the Snake algorithm can be very efficiently paral- 
lelized. In the current implementation we have adopted 
a master-slave parallclization strategy, in which one pro- 
cessor maintains the repository, and the remaining pro- 
cessors only perform likelihood evaluations for parame- 
ters provided by the master. This ensures both a simple 
implementation as well as close to perfect speed-up; after 
only a few initial iterations there are always more than 
enough available surface points to keep all processors oc- 
cupied. Moreover, the communication between the mas- 
ter and slave is minimal, consisting only of a parameter 
multiplet and a likelihood return value. 

As should be clear from the above, Snake is algorith- 
mically trivial; this is nothing but an old-fashioned likeli- 
hood grid evaluation. The only somewhat intricate part 
is to implement efficient book-keeping, which is neces- 
sary in order to maintain computational efficiency as the 
number of data elements, V, in the repository increases. 
For this purpose, we implement dictionaries, based on 
the C++ standard map template. These maps store 
the combination of two values, the key and the mapped 
value, and enable access to the mapped value by using 
the corresponding key in constant time, as opposed to 
0(V) for unsorted lists or 0(\ogV) for sorted lists. 

Two such maps are implemented for the master in 
Snake. The first is for keeping track of which points 
in parameter space correspond to which iteration, and is 
used to check if the neighbors of the current point have 
already been visited, and if so returns their iteration in- 
dex. The second map keeps track of which likelihood 
value corresponds to which iteration and is sorted ac- 
cording to descending likelihood such that the first point 
on the list will always be that with the highest likelihood, 
and therefore the index of the maximum likelihood on the 
surface is just the mapped value at the top of the map. 
When a point becomes an interior point the correspond- 
ing entries are removed from the two maps in order to 
keep these as short as possible and to avoid getting stuck 
at the overall maximum likelihood. 

2.2. Walk-through of two-dimensional example 

Before testing the algorithm on realistic cases, it is use- 
ful to walk through it stcp-by-stcp for a simple case, to 
gain some intuition for its behaviour. In this section, we 
therefore first consider the small two-dimensional exam- 
ple illustrated in Figurc[T]and Tablc[T] The unknown dis- 
tribution to be mapped is marked in Figure Q] by dashed 
lines, corresponding to 1, 2 and 3ct contours, and the 
threshold to be reached is defined as the 3ct contour. 

First, we initialize the code at (0,0), which in this 
case happened to lie slightly below and to the left of the 
maximum-likelihood point. We evaluate the likelihood, 
and insert this point into the first row of the repository 
(Table [1]). At this stage, the first four columns are fi- 
nalized, the surface flag is set to true, and none of the 
neighbor indices (indicated by the ind array of length 
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Fig. 1. — A snapshot of a made up likelihood distribution after 
29 iterations illustrating the repository of Snake. Beige boxes are 
surface points, red are internal points, blue is the overall maximum 
likelihood, and green is the maximum likelihood on the surface. 
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2-/Vpar) are set, indicating that no neighbors have been 
evaluated yet. 

Second, as specified by the algorithm we now find the 
surface point with the highest likelihood, which of course 
is the point just added. We select one of its neighbors, 
which in this case happened to be ( — 1,0). We evaluate 
its likelihood, and insert this new point into the second 
row of the repository. We update the neighbor indices 
of both this new point and the original point to point 
to each others main index. We then repeat this process 
over and over again, adding more and more points to 
the repository, until the smallest difference between the 
likelihood of the overall maximum-likelihood point and 
that of any point on the surface is larger than a pre- 
defined threshold. 

TablcUJgives a snap-shot of the repository (parameters, 
likelihood, current status of the ind array and the sur- 
face flag) at iteration number 29, matching the illustra- 
tion seen in Figure [TJ The beige boxes correspond to the 



3 




x index x index 



Fig. 2. — Two-dimensional illustration of Snake's sampling 
method. Left: The (unnormalized) target likelihood. Right: The 
path Snake takes through parameter space. It finds the closest 
peak, investigates the area around this peak, discovers the second 
peak, investigates the area around this one, and finally explores 
the joint boundary of both peaks. 

points in parameter space which lie on the surface, red 
boxes are interior points and the blue box corresponds 
to the overall maximum likelihood. The green box is the 
parameter point on the surface with the highest likeli- 
hood and will be the start point for the next iteration. 
The numbers inside the boxes correspond to the iteration 
index, thus the path Snake takes to reach the maximum 
likelihood can be see, as well as the relation between 
neighbors and the values of the first and last eight points 
quoted in the ind array in Table [TJ Iterations which have 
all ind columns filled have their surface flag set to false 
and the point no longer exists in the maps. The process 
continues until all boxes touching the 3er contour have 
turned red, after which the surface lies fully below the 
threshold. 

2.3. Exploration of double-peaked likelihood 

A second illustration of how Snake investigates param- 
eter space is given by the double-peaked 2-dimensional 
likelihood 

£ = A lC h( x - C^ 1 (x- v-t) _|_ y i l2C i(x-/j,2) T C 2 _1 (x-/j,2) 

where x is the 2-dimcnsional parameter vector, A\ and 
Ai are the peak amplitudes, C\ and C2 the corresponding 
covariance matrices and fix and fi2 the vectors of the 
means. 

The leftmost plot of Figure [2] shows a likelihood dis- 
tribution that can be described by this equation for a 
particular set of covariance matrices and means. The 
path Snake takes in the two-dimensional parameter space 
is shown in the rightmost plot of Figure and as can 
be seen Snake quickly finds the maximum likelihood of 
the closest peak, and then proceeds by investigating the 
area around this peak by visiting neighbors of the sur- 
face point with highest likelihood. When the likelihood 
being investigated falls to the value corresponding to the 
intersection of the two peaks Snake makes its way to 
the second peak, and continues by investigating the area 
around this peak in the same manner as the first peak. 
Once Snake returns to the likelihood equal to that at the 
intersection it will investigate the points around both 
peaks until the desired threshold is reached. 

Note that if the two peaks had been so far apart that 
the likelihood at the intersection fell below the threshold 
cutoff the second peak would remain undiscovered. This 
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TABLE 2 

COSMOLOGICAL PARAMETERS 



Parameter CosmoMC Snake Shift in a 
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+0.00632 
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-0.01426 


0.08 


n a 


0.9682 


+ 0.0138 
-0.0136 


0.9681 


+0.0139 
-0.0138 


0.07 


log[10 10 .4 s ] 


3.082 


+0.034 
-0.035 


3.080 


± 0.035 


0.06 



Note. — Comparison of best-fit parameters derived by CosmoMC 
and Snake from the 7-year WMAP data. 



problem can be solved in the same way as for standard 
Metropolis-Hasting samplers: Run several Snakes in par- 
allel with different initial positions. Once two indepen- 
dent Snakes touch for the first time, merge the reposito- 
ries and the CPU working groups into one master-slave 
organization. 

3. ACCURACY AND EFFICIENCY WITH INCREASING 
DIMENSIONALITY 

The main outstanding question regarding the Snake 
algorithm is how well it scales with the number of di- 
mensions in terms of efficiency. To study this question 
quantitatively, we consider a correlated Gaussian likeli- 
hood on the form 

£ = c ^ x ~^ Tc (2) 

where x, C and \x arc the multidimensional parameter 
vector, covariance matrix and vector of means, respec- 
tively. Both the mean and standard deviation of dimen- 
sion number i are arbitrarily chosen to be i. 

Our goal is now to map out this distribution in iVp ar 
dimensions, and determine the maximum number of di- 
mensions that can be probed with high accuracy using 
reasonable computational resources. To do so, we im- 
pose a limit on the number of likelihood evaluations of 
TV = 10 6 , a typical number for modern cosmological anal- 
yses. The grid cell width in dimension i is chosen to be 
8i x 7V 1 ' iv P ar , corresponding to distributing the N evalu- 
ations roughly over a grid covering roughly — 4<r to +4cr 
in each of the iVp ar dimensions. (Of course, the actual 
shape probed by Snake will not be a rectangular grid, 
but rather conform to the shape of the underlying distri- 
bution.) We then run the algorithm for increasing A par , 
and compare the resulting marginals to the known an- 
alytic input marginals; once the combined error in the 
derived mean or standard deviation is larger than O.Ict, 
we consider the algorithm to have broken down as a result 
of the sparse sampling of the underlying distribution. 

In Figure [3] we plot the combined mean and standard 
deviation errors averaged over the number of dimensions, 
7Vp ar , as a function of iV par . Here we clearly see that for 
N pax < 12, the algorithm recovers the true distribution 
with high accuracy. Of course, given more computational 
resources these errors can be decreased arbitrarily, but 
since the cost faces an exponential growth with increasing 
JVpar, it seems reasonable to define the operational range 
for Snake to be A par < 12 - 15. 

4. 7- YEAR WMAP LIKELIHOOD ANALYSIS 
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Fig. 3. — Combined mean and standard deviation errors aver- 
aged over number of dimensions (solid) showing the 0.1<r cuttoff 
(dashed). 




T n s log[10 ,0 A s ] 



Fig. 4. — Marginal cosmological parameter distributions derived 
with Snake (dashed red line) and CosmoMC (solid black line) from 
the 7-year WMAP likelihood. 

We now apply this method to the 7- year WMAP like- 
lihood, and estimate cosmological parameters within the 
well-established 6-par ameter ACDM concordance model 
(jKomatsu et el.l I2011I ) . The parameter set of choice is 
fih/i 2 , {loAih 2 , 9, t, n s and log[10 10 ^4 s ]. The same setup 
is analyzed using both Snake and CosmoMC for compar- 
ison purposes. 

The resulting normalized marginal distributions are 
shown in Figure [H and means and standard deviations 
are tabulated in Table [2j The agreement between the 
two methods is excellent, with a maximum difference be- 
tween the two methods corresponding to a 0.08cr shift in 
r and 0.07<t shift in n s . 

The CosmoMC results were obtained with an MPI con- 
vergence criterion of 0.03, while the Snake convergence 
threshold was defined to be -6.0. Both codes were run 
on 50 CPUs, and the resulting wall times were 1.42 and 
1.24 hours, respectively. 

4.2. Model selection by Bayesian evidence 

A significant advantage of Snake over CosmoMC 
is its direct acces s to the Bayesian evidence (e.g., 
IGelman et al.|[2003T) . For a given model H with parame- 
ters 9 and data d, this is simply the normalization factor, 
E = P(d\H), in Bayes' theorem, 
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The other factors are the likelihood, C(6\H) = P(d\8, H), 
the prior, P(8\H), and the posterior, P(6\d, H). Different 
models can be compared in terms of their evidence, which 
for a model, H n , is given by 



P(d\H n ) = J P(d,9\H n )d0 = I P(d\6,H n )P(6\H n )d0, 
a n 

( 4 ) 

where P(d,8\H n ) is the joint probability distribution of 
d and 9 given this model over all of parameter space, fl 
with step sizes of dd. 

Calculating the evidence for different models using re- 
sults from Snake is rather straightforward as the param- 
eter space is gridded into even cells of volume J dd. The 
integral in equation [4] becomes a sum of the likelihood 
values within the threshold multiplied by the volume of 
one grid cell, where we assume a uniform prior which 
gives a factor of 1/L for each parameter, where L is the 
range for each parameter. 

To compare two different models, Hi and H2, it is 
common to consider the quantity 



5\ogE = log£i — \0gE2 



(5) 



where E\ and E2 are the evidences of models Hi 
and i?2, respectively. The larger the value of 
S\ogE the higher the evidence in favour of model 
E\. To calibrate th is quantity, one commonly adopt s 
the Jeffreys' scale (jLiddle et all I20M iTrottal l2008h . 

{1 evidence for Ei is substantial 
2.5 evidence for Ei is strong 
5 evidence for E% is decisive. 
However, one should note that this scale only provides 
a general guideli ne, and conclusions can be appli cation 
specific; see, e.g.. iNesseris fc Garcia-Bellidol (|2012[ ) for a 
recent discussion of this issue. 

We now evaluate the evidence for both the standard 
six parameter model described above and for a reduced 
model obtained by enforcing n s = 1. We find that 
the individual evidences are Ei = —3743.16 and E2 = 
—3744.56, respectively, with an estimated uncertainty 
in each of 0.1. This corresponds to Alog-E of 1.40 in 



favour of the 6-parametcr model; the full model therefore 
provides a better fit to the data, even when accounting 
for the larger parameter v olume. Similar resu lt s hav e 
already been published by iParkinson fc Libbld (|2012l ). 
Note that given the full multi-dimensional Snake like- 
lihood, evaluation the evidence of all nested models is 
trivial by similar calculations. 

5. SUMMARY AND OUTLOOK 

In this paper we have described a simple grid-based 
estimator for multi-dimensional likelihoods. This algo- 
rithm exploits the fact that by far most of the iV pal - 
dimensional parameter volume in a general likelihood has 
negligible contributions, and spends its computational 
resources only where the likelihood itself is significant. 
However, in contrast to standard MCMC methods, it 
only considers each parameter point once, relying on the 
actual value of the likelihood. 

The main advantages of this method are 1) trivial ex- 
traction of arbitrary conditional distributions; 2) direct 
access to Bayesian evidences; 3) better sampling of the 
tails of the distribution; and 4) nearly perfect parallcliza- 
tion scaling. The main disadvantage is a computational 
cost increasing exponentially with N paT . However, we 
have shown that the algorithm is fully capable of prob- 
ing at least N par < 12—15 with reasonable computa- 
tional resources, which is sufficient for current cosmolog- 
ical models. 

In the current implementation the total cost of the 
method is comparable to that of CosmoMC for similar 
convergence criteria. However, the cost for a full Snake 
analysis can be vastly reduced by introducing adaptive 
grids, in which the grid cell depends on the local proper- 
ties of the likelihood, such that high-significance regions 
are sampled more densely than the tail regions. The 
results from this extension will be reported in a future 
publication. 

The computations presented in this paper were car- 
ried out on Titan, a cluster owned and maintained by 
the University of Oslo and NOTUR. HKE acknowlcgdcs 
support from the ERC Starting Grant StG2010-257080. 
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